-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/finalized project monitoring #77
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks an alright approach to me.
The healthcheck is called at a high frequency and snapshotter_last_finalized_check()
takes a random sample of 5 projects and checks their finalized status on the protocol state. Probabilistically, the high frequency checks coupled with random sampling should detect issues practically soon enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good! Let's just finalize one small thing then I can approve.
) | ||
|
||
if unfinalized_projects: | ||
for unfinalized_project in unfinalized_projects: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good! My only concern is that if the node is down for some reason, since core_api
is independent of that, it will keep blasting a lot of failure notifications to Slack, hitting the rate limit. Let's limit the number of failure notifications we send out per minute.
@anomit, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes memoize this to have a check on the outflow of notifications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a check to see if an unfinalized project has already been reported to limit the number of notifications in the case where projects cannot finalize. The endpoint will only report a maximum of 5 projects per node every 10 minutes, and if the network goes down, they will only report as many projects that have been assigned to each node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Fixes #
Checklist
Current behaviour
Master nodes do not currently have any project state monitoring. Node operators must manually check dashboards or make calls to the protocol state contract in order to check the state of the snapshots.
New expected behaviour
The core-api
/health
endpoint will periodically make calls to the protocol state contract when called from the docker health check. It compares the last finalized epoch for a random sample of projects against the current epoch id, and it will then report any projects that have been unfinalized for more than 50 epochs to Slack.Change logs
Added
snapshotter_last_finalized_check
function to data_utils.py. Used to gather project state from the contract.UnfinalizedProject
data model.Changed
/health
endpoint to report any unfinalized projects from the sample retrieved bysnapshotter_last_finalized_check
.UNFINALIZED_PROJECT
to theSnapshotterReportState
data model.Deployment Instructions
Will need to provide a Slack reporting url in order to make use of this feature if it was not already provided.